Selecting the N-Top Retrieval Result Lists for an Effective Data Fusion

نویسندگان

  • Antonio Juárez-González
  • Manuel Montes-y-Gómez
  • Luis Villaseñor Pineda
  • David Pinto
  • Manuel Alberto Pérez-Coutiño
چکیده

Although the application of data fusion in information retrieval has yielded good results in the majority of the cases, it has been noticed that its achievement is dependent on the quality of the input result lists. In order to tackle this problem, in this paper we explore the combination of only the n-top result lists as an alternative to the fusion of all available data. In particular, we describe a heuristic measure based on redundancy and ranking information to evaluate the quality of each result list, and, consequently, to select the presumably n-best lists per query. Preliminary results in four IR test collections, containing a total of 266 queries, and employing three different DF methods are encouraging. They indicate that the proposed approach could significantly outperform the results achieved by fusion all available lists, showing improvements in mean average precision of 10.7%, 3.7% and 18.8% when it was used along with Maximum RSV, CombMNZ and Fuzzy Borda methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective Learning to Rank Persian Web Content

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Monolingual Experiments with Far-East Languages in NTCIR-6

This paper describes our third participation in an evaluation campaign involving the Chinese, Japanese and Korean languages (NTCIR-6). Our participation is motivated by three objectives: 1) study the retrieval performances of various probabilistic and language models for these languages; 2) compare the relative retrieval effectiveness of a combined “unigram & bigram” indexing scheme combined wi...

متن کامل

Segmentation of Search Engine Results for Effective Data-Fusion

Metasearch and data-fusion techniques combine the rank lists of multiple document retrieval systems with the aim of improving search coverage and precision. We propose a new fusion method that partitions the rank lists of document retrieval systems into chunks. The size of chunks grows exponentially in the rank list. Using a small number of training queries, the probabilities of relevance of do...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010